Synthetic gene coding for human granulocyte-colony stimulating factor for the expression in e.coli

ABSTRACT

The invention relates to the synthetic gene coding for hG-CSF which enables expression in  E. coli  with an improved expression level of the recombinant hG-CSF regarding the total cellular proteins after expression.

FIELD OF THE INVENTION

The present invention relates to synthetic gene coding for humangranulocyte-colony stimulating factor (hG-CSF) which enables expressionin E. coli with an improved expression level, enabling an expressionlevel being equal to or higher than 52% of the recombinant hG-CSF to thetotal proteins after expression.

hG-CSF belongs to a family of stimulating factors which regulate thedifferentiation and proliferations of hematopoetic mammalian cells. Theyhave a major role in the neutrophil formation and are therefore suitablefor use in medicine in the field of hematology and oncology.

Two forms of hG-CSF are currently available for clinical use on themarket: lenograstim which is glycosylated and is obtained by theexpression in mammalian cell line and filgrastim which isnon-glycosylated and is obtained by the expression in the bacteriumEscherichia coli (E. coli).

BACKGROUND OF THE INVENTION

The impact of several successive rare codons such as arginine codons(AGG/AGA; CGA), leucine codon (QTA), isoleucine codon (ATA) and prolinecodon (CCC), on the level of translation and consecutively on thedecrease of the amount and quality of the expressed protein in E. coliare described in Kane J F, Current Opinion in Biotechnology, 6:494-500(1995). There is a similar impact of individual rare codons if theyoccur in different parts of the gene.

The GC rich regions also have impact on the translational efficiency inE. coli if a stable double stranded RNA is formed in the mRNA secondarystructure. This impact is the highest when the GC rich regions of mRNAare found either in the RBS, or in the direct proximity of the RBS oralso in the direct proximity of the start codon (Makrides S C,Microbiological Reviews, 60:512-538 (1996); Baneyx F, Current Opinion inBiotechnology, 10:411-421 (1999)).

There are known several methods for the prediction of the secondarystructure and calculating minimal free energy of individual RNA moleculewhich is supposed to be the basic rule for the most stable/most probablestructure (SantaLucia J Jr and Turner D H, Biopolymers, 44:309-319(1997)). The reliable algorithms for the prediction of the correctsecondary structure are not known with the exception of some cases.There has been no evidence for the quantitative correlation with theexpression level (Smit M H and van Duin J J. Mol. Biol., 244, 144-150(1994)). It is still impossible to predict the tertiary structures ofRNA (Tinoco I and Bustamante C, J. Mol. Biol, 293:271-281 (1999)).

The increase of the expression level after the optimization of DNAsequence in the TIR region, in the RBS region and in the region betweenthe start codon and the RBS region is described in McCarthy J E G andBrimacombe R, Trends Genet 10:402-407 (1994). In this case theexpression level increased due to more efficient translation initiationand its smooth continuation in the mRNA coding region.

The production of adequate amounts of hG-CSF for performing the in vitrobiological studies by expression in E. coli is described in Souza L M etal, Science 232:61-65 (1986) and in Zsebo K M et al, Immunobiology172:175-184 (1986). The hG-CSF expression level was lower than 1%.

The U.S. Pat. No. 4,810,643 discloses the use of synthetic gene codingfor hG-CSF which was first of all constructed on the basis ofreplacement of E. coli rare codons with the E. coli preference codons.The combination with thermoinducible phage lambda promoter led to theexpression level of 3 to 5% of hG-CSF regarding the total cellularproteins. This level is not sufficient for the economical large-scaleproduction of hG-CSF.

8-10% accumulation of hG-CSF to total cellular proteins was reached bychanging the first four codons in the 5′ end region of hG-CSF as isdescribed in Wingfield P et al, Biochem. J, 256:213-218 (1988).

The expression of hG-CSF in E. coli with the yield up to 17% of hG-CSFto total cellular bacterial proteins is described in Devlin P E et al,Gene 65:13-22 (1988). Such yield was reached with partial optimizationof DNA sequence in the 5′ end of the G-CSF coding region (codons codingfor the first four amino acids) whereby the GC region was replaced withAT region and a relatively strong lambda phage promoter was used. Thisexpression level is not very high what leads to lower production yieldsand is less economical in the large-scale production.

The use of synthetic gene and the expression level of about 30% aredescribed in Kang S H et al, Biotechnology letters, 17(7):687-692(1995). This level was attained by the introduction of E. colipreference codons, by the modifications in the TIR region and with theadditional modifications of codon sets whereby the 3′ end of the genewas not essentially changed. Thus, for attaining the stated expressionlevel the changes of the gene in the TIR region were needed and theexpression level did not exceed 30%.

The U.S. Pat. No. 5,840,543 describes the synthetic gene coding forhG-CSF which was constructed by the introduction of AT rich regions atthe 5′ end of the gene and with the replacement of E. coli rare codonswith E. coli preference codons. Under the control of the Trp promoterthe expression with the yield of 11% hG-CSF to total cellular proteinswas reached. On the other hand, the addition of leucine and threonine ortheir combination into the fermentation medium (where the bacteria werecultivated) led to the accumulation of up to 35% of hG-CSF regardingtotal cellular proteins. Such expression level was therefore reached bythe addition of amino acids into the fermentation medium what is anadditional cost in the process for production of hG-CSF and is noteconomical for the industrial production. Only optimization of the genecoding for hG-CSF did not enable a higher expression level of hG-CSF.

The highest accumulation of hG-CSF regarding total cellular proteinsfound in the prior art is described in v Jeong et al, Protein Expressionand Purification 23,:311-318 (2001) and is 48%. Such accumulation wasobtained by the changes in the N-terminal end and by the induction with1 mM IPTG.

In general, there are no reports on possible predictions of theexpression level of native human genes in prokaryotic organisms, e.g.bacterium E. coli. The described expression levels are relatively low ordifficult to detect even when the expression plasmids with strongpromoters, e.g. from lambda or T7 phage are used. From the prior artliterature it can be gathered that many parameters (rare codons or theirclustering; GC base pairs rich regions, unfavorable mRNA secondarystructures, unstable mRNA) have an impact on the accumulation of a humanprotein in E. coli.

Until now there has been no entirely developed rule known on how tocombine the codons in order to obtain the secondary or tertiary mRNAstructures which are optimal for expression. Although there exist somemathematical and structural models for predicting and thermodynamicalstability of secondary structures, but they are too unreliable topredict the secondary structures. On the other hand, there are no suchmodels for predicting the tertiary structures. These currentlyaccessible models therefore do not enable the prediction of the impactof the codons on the expression level.

There are no reports in either the patent or the scientific literatureon the more efficient way for solving the problem of low expressionlevel of the native gene coding for hG-CSF in E. coli.

SUMMARY OF THE INVENTION

It is thus an object of the present invention to provide a DNA sequencecoding for hG-CSF or biologically active G-CSF, which DNA sequenceenables an improved expression level (accumulation) in E. coli, and toprovide a process for the construction of such a DNA sequence.

The object is solved by a DNA sequence according to claim 1, and by aprocess for the construction of such a DNA sequence according to claim15. The present invention also provides an expression plasmid accordingto claim 6 or 7, an expression system according to claim 11 or 12, aprocess for the expression of hG-CSF according to claim 20 and a processfor the manufacture of a pharmaceutical composition according to claim24. Preferred embodiments are defined in sub-claims.

The significant feature of the present invention is that the use ofsynthetic gene coding for hG-CSF enables to attain an expression level(accumulation) in E. coli being equal to or higher than 52% ofrecombinant hG-CSF regarding the total proteins in E. coli Preferably,an expression plasmid containing a strong T7 promoter is used for theexpression. The synthetic gene coding for hG-CSF is constructed by usinga complex combination of two methods which enable the construction ofoptimized synthetic gene (coding for hG-CSF) for its expression in E.coli. The first method includes the replacement of some rare E. colicodons which are unfavorable for expression in E. coli by E. colipreference codons for which are more favorable for the expression in E.coli. The second method includes the replacement of some GC rich regionsby AT rich regions. Some parts of the synthetic gene of the presentinvention are constructed by using one of the two methods, for someparts; the combination of the two methods is used, whereas some parts ofthe gene are not changed. In the construction procedure of the syntheticgene coding for hG-CSF, which is also the subject of the presentinvention, the non coding (5′-untranslated) regions are preferably notchanged. Advantageously, this means that there are no modifications ineither the translation initiation region (TIR) or in the ribosomebinding site (RBS), or in the region between the start codon and RBS.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 schematically shows an optimized construction of a synthetic genecoding for hG-CSF according to a preferred embodiment of the presentinvention.

FIG. 2 shows the DNA sequence of the native gene sequence coding forhG-CSF (FIG. 2A) (GenBank: NM_(—)000759) and the DNA sequence of theoptimized (Fopt5) gene coding for hG-CSF (FIG. 2B). The bases whichdiffer from native gene are bolded.

FIG. 3 shows an SDS-PAGE analysis of samples of proteins obtained fromthe expression of native hG-CSF DNA sequence (lanes 1 to 4) and ofoptimized (Fopt5) gene coding for hG-CSF (lanes 6 and 7) in induced andnoninduced cultures of E. coli, as evaluated by dye staining (FIG. 3A)and by Western blot using antibody specific for hG-CSF protein (FIG.3B).

FIG. 4 shows an SDS-PAGE analysis of samples of proteins obtained fromthe expression of optimized (Fopt5) gene coding for hG-CSF in inducedculture of E. coli, as evaluated by dye staining.

FIG. 5 shows an SDS-PAGE analysis of samples of proteins obtained fromthe expression of optimized (Fopt5) gene coding for hG-CSF in inducedculture of E. coli according to an alternative embodiment, as evaluatedby dye staining

DESCRIPTION OF THE INVENTION AND THE PREFERRED EMBODIMENTS THEREOF

It has been found that the problem with the low expression level of thegene coding for hG-CSF in E. coli can be solved by the optimization ofthe gene sequence coding for hG-CSF. The native gene coding for hG-CSFis changed, leading to the construction of a particular synthetic genecoding for hG-CSF. The particular synthetic gene is defined by the DNAsequence of SEQ ID NO: 1 or by a nucleotide sequence comprising suitablemodifications of SEQ ID NO: 1 or of the native hG-CSF gene sequence.

In comparison with the data described in the art, surprisingly highexpression level can be obtained according to the present invention.

The term ‘hG-CSF’, as used herein, refers to human granulocyte-colonystimulating factor, comprising the recombinant hG-CSF obtained by theexpression in E. coli.

The synthetic gene encoding hG-CSF of the present invention was obtainedby introducing changes in the nucleotide sequence of the gene encodingthe native hG-CSF. Thus the amino acid sequence was not changed andremained identical to the native hG-CSF.

The present invention further comprises a process for the expression ofthe synthetic gene in E. coli and concerns the level of expression ofthe synthetic gene.

The term ‘expression level’, as used herein, refers to the proportion ofhG-CSF obtained after the heterologous expression of the gene encodinghG-CSF regarding the total cellular proteins after expression. Theexpression level may be quantified from the quantification ofappropriately separated proteins after expression, e.g. quantifying thestaining of protein bands separated by SDS-PAGE.

The term ‘heterologous expression’, as used herein, refers to theexpression of the genes which are foreign to the organism in which theexpression occurs.

The term ‘homologous expression’, as used herein, refers to theexpression of the genes which are proper to the organism in which theexpression occurs.

The term ‘preference codons’, as used herein, refers to the codons usedby an individual organism (e.g. E. coli) for the production of most mRNAmolecules. The organism uses these codons for expressing genes with highhomologous expression.

The term ‘rare codons’ as used herein, refers to the codons used by anindividual organism (e.g. E. coli) only for expressing genes with lowexpression level. These codons are rarely used in the organism (lowhomologous expression).

The term ‘GC rich regions’, as used herein, refers to the regions in thegene where the bases guanine (G) and cytosine (C) prevail.

The term ‘AT rich regions’, as used herein, refers to the regions in thegene, where the bases adenine (A) and thymine (T) prevail.

The term ‘synthetic gene’, as used herein, refers to the gene preparedfrom short double stranded DNA fragments which are composed of syntheticcomplementary oligonucleotides. This synthetic gene differs from thenative gene (e.g., cDNA) only in the nucleotide sequence whereby theamino acid sequence remains unchanged. The synthetic gene is obtained bythe techniques of the recombinant DNA technology.

The term ‘native gene’, as used herein, refers to the DNA sequence of agene which is identical to the native DNA sequence.

The term ‘segment’, as used herein, refers to the parts of the geneswhich are bounded by single restriction sites on both ends. These sitesserve as subcloning sites for the synthetically constructed parts of thegene. In the following the restrictions sites are numbered according tothe nucleotide position in the 5′-3′ direction from the start codon.

The term ‘segment I’, as used herein, refers to the 5′ end of the geneencoding hG-CSF between the nucleotide positions 3 and 194 (notably therestriction sites NdeI (3) and SacI (194)), i.e. 191 bp long sequence.Segment I may be de novo synthesized.

The term ‘segment II’, as used herein, refers to the part of the genefor hG-CSF between the nucleotide positions 194 and 309 (notably therestriction sites SacI (194) and ApaI (309)), i.e. 115 bp long centralpart of the gene. Segment II may be de novo synthesized.

The term ‘segment III’, as used herein, refers to the part of the genefor hG-CSF between the nucleotide positions 309 and 467 (notably therestriction sites ApaI (309) and NheI (467)), i.e. 158 bp long part ofthe gene where the native DNA sequence for hG-CSF is preserved with theexception of codons for Arg148 and Gly150.

The term ‘segment IV’, as used herein, refers to the 3′ terminal end ofthe gene encoding hG-CSF between the nucleotide positions 467 and 536(notably the restriction sites NheI (467) and BamHI (536)), i.e. 69 bplong terminal part of the gene. Segment IV may be de novo synthesized.

The synthetic gene encoding hG-CSF of the present invention isconstructed by the combination of the following methods:

-   -   replacement of E. coli rare codons with E. coli preference        codons: in the segment II (between restriction sites SacI (194)        and ApaI (309)) and in the segment IV (between restriction sites        NheI (467) and BamHI (536))    -   replacement of GC rich regions with AT rich regions, whereby the        rarest E. coli codons are replaced, but mostly not with the E.        coli preference codons: in the segment I (between restriction        sites NdeI (3) and SacI (194)).    -   completely unchanged native sequence of 46 codons (between CCC        for Pro102 and CGC for Arg147) in the segment III.

replacement of two E. coli rare codons (CGG→CGT (Arg148) and GGA→GGT(Gly150)) at the terminal end of the segment III.

Optimization of the gene coding for hG-CSF of the present invention doesnot include changes in the TIR, RBS and in the regions between the startcodons and RBS.

The synthetic gene of the present invention encoding hG-CSF enablesexpression of the constructed synthetic gene encoding hG-CSF with theexpression level in E. coli equal to or higher than 52%. Furthermore,the expression level of about 55% or even about 60% can also beobtained. High expression level of the synthetic gene coding for hG-CSFof the present invention enables high yields of hG-CSF production,faster and simpler purification and isolation of heterologous hG-CSF,easier in-process control, and the whole production process is moreeconomical. Therefore, the efficient production of hG-CSF in industrialscale is enabled. The produced hG-CSF is suitable for clinical use inmedicine.

The construction of the synthetic gene of the present invention beginswith the initial preparation of the hG-CSF native gene and of theplasmids. Gene coding for native hG-CSF can be of human origin, but thesame principle can be used for every gene which is homologous in theregions which comprise single restriction sites which are used forsubcloning of de novo synthesized gene segments. The plasmid formutagenesis was chosen according to its ability to be capable ofenabling the successive introduction of point mutations. Selection orenrichment of the plasmids containing desired mutation was obtained byusing an additional selection primer that changed unique restrictionsite EcoRI into EcoRV or vice-versa (Transformer™ Site-DirectedMutagenesis Kit (Clontech)). The gene and the plasmid are constructed insuch a way that the introduction of point mutation by cassettemutagenesis is possible.

After the initial preparation of native gene coding for hG-CSF and ofplasmids the optimization of the native gene coding for hG-CSF isperformed. This means that the synthetic gene coding for hG-CSF isconstructed. The optimization begins with the division of the nativegene coding for hG-CSF into four (I, II, III in IV) segments, which areor will be separated with single restriction sites after theoligonucleotide mutagenesis and in the individual segments the changesare introduced. In some individual segments the changes in the genesequence are introduced whereas in certain segments the gene is notchanged (FIG. 1). The obtained optimized synthetic gene coding forhG-CSF therefore consists of partially preserved native sequence(segment III) and of 5′ and 3′ coding regions which are synthesized denovo (segments I, II and IV).

The changes in the individual segments:

Segment I: Replacement of E. coli rare codons with E. coli preferencecodons and replacement of GC rich regions with AT rich regions

Italic: GC/AT rich replacement; Italic and underlined: rare/preferencecodon replacements and GC/AT rich replacement; underlined:rare/preference codon replacements; Gly101 (GGT→GGG) introduction ofApaI (309) restriction site.

Thr2 (ACC→ACA), Pro3 (CCC→CCA), Gly5 (GGC→GGT) Pro6 (CCT→CCA), Ala7(GCC→GCT), Ser8 (AGC→TCT), Ser9 (TCC→TCT), Pro11 (CCC→CCG), Gln12(CAG→CAA), Phe14 (TTC→TTT), Leu16 (CTC→TTG), Lys17 (AAG→AAA), Cys18(TGC→TGT), Glu20 (GAG→GAA), Val22 (GTG→GTT), Arg23 (AGG→CGT), Lys24(AAG→AAA) Ile25 (ATC→ATT), Gln26 (CAG→CAA), Gly27 (GGC→GGT), Gly29(GGC→GGT), Ala31 (GCG→GCT), Leu32 (CTC→TTA), Gln33 (CAG→CAA), Glu34(GAG→GAA), Lys35 (AAG→AAA), Ala38 (GCC→GCA), Thr39 (ACC→ACT), Tyr40(TAC→TAT), Lys41 (AAG→AAA), Cys43 (TGC→TGT), His44 (CAC→CAT), Pro45(CCC→CCA), Glu46 (GAG→GAA), Glu47 (GAG→GAA), Val49 (GTG→GTT), Leu51(CTC→TTA), Gly52 (GGA→GGT), His53 (CAC→CAT), Gly56 (GGC→GGT), Ile57(ATC→ATT), Pro58 (CCC→CCG), Pro61 (CCC→CCT)

Segment II: Replacement of E. coli rare codons with E. coli preferencecodons.

Cys65 (TGC→TGT), Pro66 (CCC→CCG), Ala69 (GCC→GCG), Leu76 (TTG→CTG),Leu79 (CTC→CTG), Gly82 (GGC→GGT), Leu83 (CTT→CTG), Phe84 (TTC→TTT),Leu85 (CTC→CTG), Tyr86 (TAC→TAT), Gly88 (GGG→GGT), Leu89 (CTC→CTG),Ala92 (GCC→GCG), Gly95 (GGG→GGC), Ile96 (ATA→ATT), Pro98 (CCC→CCG),Glu99 (GAG→GAA), Leu100 (TTG→CTG), Gly101 (GGT→GGG)

Segment III: Replacement of two E. coli rare codons situated just beforethe restriction site NheI

Arg 148 (CGG→CGT), Gly150 (GGA→GGT)

Segment IV: Replacement of a long cluster of E. coli rare codons at theterminal end of the gene with E. coli preference codons.

Gln159 (CAG→CAA), Ser160 (AGC→TCT), Phe161 (TTC→TTT), Glu163 (GAG→GAA),Val164 (GTG→GTT), Ser165 (TCG→AGC), Tyr166 (TAC→TAT), Arg167 (CGC→CGT),eu169 (CTA→CTG), Arg170 (CGC→CGT), His171 (CAC→CAT), Leu172 (CTT→CTG),Ala173 (GCG→GCT), Pro175 (CCC→CCG)

After the construction of the synthetic gene coding for hG-CSF theoptimized synthetic gene is subcloned in the final plasmid vectorsuitable for the expression in E. coli Preferably, the plasmid vector isselected from the group of pET vectors (available from Novagen). Thesevectors contain a strong T7 promoter. More preferably the plasmid vectorpET3a comprising an ampicilline resistance gene, and particularly theplasmid vector pET9a comprising a kanamycin resistance gene is used. Theexpression plasmid which is thereby constructed is then transformed intoan appropriate E. coli production strain. Preferably, the E. coliproduction strain is selected from the group of strains which carry onthe chromosome or expression plasmid gene for T7 RNA polymerase. Mostpreferably, E. coli BL21 (DE3) is used.

The procedure is continued with the preparation of inoculum and with thefermentation process in a suitable culture medium. Preferably, IPTG isused for induction, suitable at a concentration in the range of about0.1 mM to about 1 mM. Preferably at a concentration of about 0.3 to 0.6mM. The fermentation, can be performed at about 37° C., but ispreferably performed below 30° C., more preferably at about 20 to 30°C., particularly at about 25° C. Performing the fermentation process atsuch a lower temperature than conventionally used can advantageouslyassist in the accumulation of precursor molecules of biologically activeG-CSF in inclusion bodies.

The fermentation process may be performed in the presence or in theabsence of the antibiotic that corresponds to resistance gene which isinserted into the plasmid vector, e.g. with ampicilline or kanamycin atan appropriate concentration or in the absence thereof. It has beenfound that the fermentation and thus the accumulation of hG-CSF washighly effective also without a selection pressure.

The accumulated heterologous hG-CSF is found in the inclusion bodies andis suitable for the renaturation process and use in the isolationprocedures.

Suitable techniques for the isolation and/or purification of the hG-CSFor biologically active G-CSF protein are known to the person skilled inthe art and can be used, e.g., classical or expanded-bed chromatographyusing any of well known principles, e.g., ion-exchange,hydrophobic-interaction, affinity or size-exclusion, as well ascontinuous and batch-mode extractions using appropriate matrices orsolutions. The preferred technique is immobilised metal affinitychromatography (IMAC), as it enables a highly efficient preparation ofpure and biologically active protein in high yield and under nativeconditions.

The isolated and/or purified hG-CSF or biologically active G-CSFobtained according to the present invention can be used in a process forthe manufacture of a pharmaceutical composition containing it as aneffective ingredient. The pharmaceutical composition comprises an amountof hG-CSF or biologically active G-CSF that is therapeutically effectiveto treat a desired disease in a patient.

Suitable pharmaceutically acceptable carrier or auxiliary substancesinclude suitable diluents, adjuvants and/or carriers useful in G-CSFtherapy.

Biologically active G-CSF which was obtained by using the process of thepresent invention can be used for preparation of medicaments, which areindicated for the indications selected from the group, which comprises:neutropenia and neutropenia-related clinical sequelae, reduction ofhospitalisation for febrile neutropenia after chemotherapy, mobilisationof hematopoietic progenitor cells, as alternative to donor leukocyteinfusion, chronic neutropenia, neutropenic and non-neutropenicinfections, transplant recipients, chronic inflammatory conditions,sepsis and septic shock, reduction of rist, morbidity, mortality, numberof days of hospitalisation in neutropenic and non-neutropenicinfections, prevention of infection and infection-related complicationsin neutropenic and non-neutropenic patients, prevention of nosocomialinfection and to reduce the mortality rate and the frequency rate ofnosocomial infections, enteral administration in neonates, enhancing theimmune system in neonates, improving the clinical outcome in intensivecare unit patients and critically ill patients, wound/skin ulcers/burnshealing and treatment, intensification of chemotherapy and/orradiotherapy, pancytopenia, increase of anti-inflammatory citokines,shortening of intervals of high-dose chemotherapy by the prophylacticemployment of filgrastim, potentiation of the anti-tumour effects ofphotodynamic therapy, prevention and treatment of illness caused bydifferent cerebral disfunctions, treatment of thrombotic illness andtheir complications and post irradiation recovery of erythropoiesis.

It can be also used for treatment of all other illnesses, which areindicative for G-CSF.

The pharmaceutical composition containing the pure and biologicallyactive G-CSF obtained by the process of the invention can thus beadministered, in a manner known to those skilled in the art, to patientsin a therapeutically amount which is effective to treat the abovementioned diseases.

The present invention will be explained in more detail by the examplesbelow and by reference to the accompanying drawings, which examples anddrawings are however merely illustrative and shall not considered aslimiting the present invention.

EXAMPLES Example 1 Construction of the Optimal Gene: Fopt5 Example 1aThe Initial Gene and Plasmid Preparations

The gene coding for hG-CSF was amplified from BBG13 (R&D) with the PCRmethod, which was also used to introduce by using the startoligonucleotides the restriction sites NdeI and BamHI at the start andterminal end of the gene. The gene was then incorporated in the plasmidpCytexΔH,H (see the description below) between the restriction sitesNdeI and BamHI. All other optimization steps for the expression of thegene in E. coli were also performed in this plasmid.

During the initial gene preparation the EcoRV restriction site wasannihilated (oligo M20z108) by point mutation. This was performed withthe aim to ensure the possibility of introduction of (individual)mutations by using the oligonucleotide-directed mutagenesis in theplasmid pCytexΔH,H with the kit Transformer™ Site-Directed MutagenesisKit (Clontech). The selection of mutants in the plasmid pCytexΔH,H-G-CSFvia the restriction sites EcoRI/EcoRV was therefore possible.

The starting plasmid pCYTEXP1 (Medac, Hamburg) was reconstructed in away to enable the constitutive expression. This was performed by theexcision of the part of the gene coding for cI857 repressor between bothrestriction sites HindIII. The obtained plasmid was named pCytexΔH,H.

The oligonucleotide for the annihilation of EcoRV site from the genecoding for hG-CSF: M20z108 5′-CCT GGA AGG AAT ATC CCC CG-3′

Example 1b Codon Optimization (FIG. 1)

In the first optimization step the synthetic gene between therestriction sites NdeI and SacI was constructed by ligation of fivecassettes (A, B, C, D, E) which were composed of complementaryoligonucleotides. This synthetic part of the gene represents the segmentI. With the segment I the part of the native gene for hG-CS F betweenthe restriction sites NdeI and SacI was replaced. This was performed bythe excision of the first part of the gene between the restriction sitesNdeI and SacI and its replacement with the synthetically preparedcassette. The process was performed in two steps. In the first step, thecassette A was ligated to the NdeI site and the cassette E was ligatedto the SacI site. After 16 hours at 16° C. the ligation mixture wasprecipitated with ethanol to remove the excess of (not bound)oligonucleotides. In the second steps the central part of the wholecassette (cassettes B, C and D) from the three previously ligatedcomplementary oligonucleotides was added and the ligation was performedfor 16 hours at 16° C.

In the second optimization step the two for E. coli most critical codonslocated in the segment III, namely, CGG→CGT (Arg148) and GGA→GGT(Gly150), were replaced by using the oligonucleotide-directedmutagenesis (Transformer™ Site-Directed Mutagenesis Kit (Clontech)).

In the third optimization step the segment IV was constructed in asimilar way as the segment I with the exception of intermediate ethanolprecipitation. The segment IV represents the last part of the genebetween the restrictions sites NheI and BamHI and is composed of twopairs of complementary oligonucleotides (cassettes F and G).

In the fourth step of optimization the rare codon coding for Ile96 wasreplaced (ATA→ATT) (segment II) by using the oligonucleotide-directedmutagenesis (Transformer™ Site-Directed Mutagenesis Kit (Clontech)) andthe restriction site for ApaI (309) (GGT→GGG (Gly101)) was introduced atthe 3′ end of the segment II. ApaI restriction site was then used in thefifth optimization step with the aim to replace the native gene betweenSacI and ApaI with the synthetic DNA (segment II). This synthetic DNA iscomposed of three pairs of complementary oligonucleotides (cassette H, Iand J). This was performed similarly as in the first step with the lateraddition of the cassette 1.

1^(st) Optimization Step:

Complementary Pairs of Oligonucleotides (NdeI-SacI; Segment I in FIG.1):

Cassette A: Composed of Complementary Oligonucleotides zg1os1 in sp1os2: zg1os1 5′ TAT GAC ACC ACT GGG TCC AGC TTC TTC TCT GCC GCA AAG 3′sp1os2 5′ GCA GAG AAG AAG CTG GAC CCA GTG GTG TCA 3′

Cassette B: Composed of Complementary Oligonucleotides zg2os3 in sp2os4:zg2os3 5′ CTT TCT GTT GAA ATG TTT AGA ACA AGTTCG TAA AAT TCA AG 3′sp2os4 5′ GAA CTT GTT CTA AAC ATT TCA ACA GAA AGC TTT GCG 3′

Cassette C: Composed of Complementary Oligonucleotides zg3os5 in sp3os6:zg3os5 5′ GTG ATG GTG CAG CTT TAC AAG AAA AAC TGT GTG 3′ sp3os6 5′ GTTTTT CTT GTA AAG CTG CAC CAT CAC CTT GAA TTT TAC 3′

Cassette D: Composed of Complementary Oligonucleotides zg4os7 in sp4os8:zg4os7 5′ CAA CTT ATA AAC TGT GTC ATC CAG AAG AAC TGG TTC TGT TAG 3′sp4os8 5′ CAG TTC TTC TGG ATG ACA CAG TTT ATA AGT TGC ACA CA 3′

Cassette E: Composed of Complementary Oligonucleotides zg5os9 insp5os10: zg5os9 5′ GTC ATT CTC TGG GTA TTC CGT GGG CTC CTC TGA GCT 3′sp5os10 5′ CAG AGG AGC CCA CGG AAT ACC CAG AGA ATG ACC TAA CAG AAC 3′2^(nd) Optimization Step: Oligonucleotides for the Replacement of theMost Critical Codons by Using the Oligonucleotide-Directed Mutagenesis

replacement CGG→CGT (Arg 148) and GGA→GGT (Gly 150) m38os16 5′ CTC TGCTTT CCA GCG CCG TGC AGG TGG GGT CCT GGT TG 3′3^(rd) Optimization Step: Complementary Pairs of Nucleotides(NheI-BamHI; Segment IV on FIG. 1):

Cassette F: Composed of Complementary Nucleotides zg6os11 in sp6os12:zg6os11 5′ CTA GCC ATC TGC AAT CTT TTC TGG AAG TTA G 3′ sp6os12 5′ ACGATA GCT AAC TTC CAG AAA AGA TTG CAG ATG G 3′

Cassette G: Composed of Complementary Oligonucleotides zg7os13 insp7os14: zg7os13 5′ CTA TCG TGT TCT GCG TCA TCT GGC TCA GCC GTG ATA AG3′ sp7os14 5′ GAT CCT TAT CAC CGC TGA GCC AGA TGA CGC AGA AC 3′4^(th) Optimization Step: Oligonucleotides for the Introduction of ApaI(309) (GGT→GGG (Gly101)), and the Replacement of the Rare Codon ATA→ATT(Ile96) by Using the Oligonucleotide-Directed Mutageriesis

insertion of ApaI (309) (GGT→GGG (Gly101)), and replacement ATA→ATT (lle96): Apalos15 5′ GCC CTG GAG GGG ATT TCC CCC GAG TTG GGG CCC ACC TTG GACAC 3′5. Optimization Step: Complementary Pairs of Oligonucleotides(SacI-ApaI; Segment II in FIG. 1):

Cassette H: Composed of Complementary Oligonucleotides zg8os18 insp8os19: zg8os18 5′ CCT GTC CGA GCC AGG CGC TGC AGC TGG CAG GCT CCC TGAG 3′ sp8os19 5′ CCT GCC AGC TGC AGC GCC TGG CTC GGA CAG GAG CT 3′

Cassette I: Composed of Complementary Oligonucleotides zg9os20 insp9os21: zg9os20 5′ CCA ACT GCA TAG CGG TCT GTT TCT GTA TCA GGG TCT GCTG 3, sp9os21 5′ CTG ATA CAG AAA CAG ACC GCT ATG CAG TTG GCT CAG GCA G 3,

Cassette J: Composed of Complementary Oligonucleotides zg10os22 insp10os23: zg10os22 5′ CAG GCG CTG GAA GGC ATT TCC CCG GAA CTG GGG CC 3′sp10os23 5′ CCA GTT CCG GGG AAA TGC CTT CCA GCG CCT GCA GCA GAC C 3′

Example 2 Expression of the Synthetic Gene Coding for hG-CSF in E. coli

The optimized gene Fopt5 was excised from the plasmid pCyΔH,H with therestriction enzymes NdeI and BamHI and the gene was then subcloned inthe final expression plasmid pET3a (Novagen, Madison USA), whichcontains an ampicilline resistance gene, which was then transformed intothe production strain E. coli BL21 DE3).

The cultures were prepared on a shaker at 160 rpm for 24 hours at 25° C.or 15 hours it 42° C.:

-   -   in LBG10/amp100 medium (10 g/l tryptone, 5 g/l yeast extract, 10        g/l NaCl, 10 g/l glucose, 100 mg/l ampicillin). The induction        was performed with the addition of IPTG to the final        concentration of 0.4 mM.

The cultures were prepared on a shaker for 24 hours at 160 rpm at 25°C.:

-   -   in GYSP/amp100 medium (20 g/l phytone, 5 g/l yeast extract, 10        g/l NaCl, 10 g/l glucose, metals in traces, 100 mg/l        ampicillin). The induction was performed with the addition of        IPTG into the medium to the final concentration of 0.4 mM.    -   in LYSP/amp100 medium (20 g/l phytone, 5 g/l yeast extract, 10        g/l NaCl, 6 g/l glycerol, 4 g/l lactose, metals in traces, 100        mg/l ampicillin). The induction was performed with the addition        of lactose into the medium.

The inoculum was prepared in LBG/amp100 medium (10 g/l tryptone, 5 g/lyeast extract, 10 g/l NaCl, 2.5 g/l glucose) and 100 mg/l ampicillin at25° C., 160 rpm overnight.

For analysis 8 ml of the culture was centrifuged at 5000 rpm. Thepellets were then resuspended in 10 mM Tris HCl/pH=8.0 in a proportionof 0.66 ml buffer added to calculated 1 unit OD_(600nm). The loadedamounts were thereby equalized. Namely, the final OD_(600nm) of thecultures in the stated examples were not equal. The samples were mixedin the proportion of 3:1 with 4×SDS—sample buffer with DTT (pH=8.7) andheated 10 minutes at 95° C., centrifuged and loaded onto he gel.

Samples of various expression examples, using the optimized geneconstruction and the conventional hG-CSF cDNA, were compared by SDS-PAGEevaluations. The SDS-PAGE conditions were as follows, giving results areshown by FIGS. 3 and 4.

-   FIG. 3 A: SDS-PAGE (4% stacking, 15% separating; stained with    Coomassie brilliant blue) of the samples of the proteins from the    induced and noninduced cultures of production strains E. coli BL21    (DE3) with the expression plasmid pET3a at 25° C. and 42° C. The    cultures were cultivated in the LBG10/amp100 medium.    Legend:

Load 1: BL21 (DE3) pET3a-hG-CSF non-induced at 25° C. (10 μl) (no tracesof hG-CSF)

Load 2: BL21(DE3) pET3a-hG-CSF induced with IPTG at 25° C. (10 μl)(slight trace hG-CSF)

Load 3: BL21 (DE3) pET3a-hG-CSF non-induced at 42° C. (10 μl) (no traceshG-CSF)

Load 4: BL21 (DE3) pET3a-hG-CSF induced with IPTG at 42° C. (10 μl)(under 1% hG-CSF)

Load 5: standard filgrastim 0.3 μg for Coomassie brilliant blue

Load 6: BL21 (DE3) pET3a-Fopt5 non-induced at 25° C. (5 μl) (6% hG-CSF)

Load 7: BL21 (DE3) pET3a-Fopt5 induced with IPTG at 25° C. (5 μl) (over50% hG-CSF)

-   FIG. 3 B: Detection with antibodies (Western blot); primary rabbit    antibodies; secondary goat anti-rabbit IgG antibodies conjugated    with horseradish peroxidase, substrate β-naphthol.

The samples for the detection with antibodies were loaded in the sameamount and in the same sequence as at SDS-PAGE (FIG. 3 a) with theexception of the standard which load was 0.08 μg.

-   FIG. 4: SDS-PAGE (4% stacking, 15% separating; stained with    Coomassie brilliant blue) samples of proteins from induced culture    of the production strain E. coli BL21 (DE3) with the expression    plasmid pET3a at 25° C. The cultures were cultivated in GYSP/amp100    and LYSP/amp100 medium.    Legend:

Load 1: LMW (BioRad)

Load 2: BL21 (DE3) pET3a/P-Fopt5, the culture cultivated in LYSP/amp100;(60% hG-CSF)

Load 3: BL21 (DE3) pET3a/P-Fopt5, the culture cultivated in LYSP/amp100;(over 54% hG-CSF)

Load 4: rhG-CSF (0.6 μg)

Load 5: rhG-CSF (1.5 μg)

Load 6: BL21 (DE3) pET3a/P-Fopt5, the culture cultivated in GYSP/amp100(4 μl); (55% hG-CSF)

Load 7: BL21 (DE3) pET3a/P-Fopt5, the culture cultivated in GYSP/amp100(5 μl); (52% hG-CSF)

The content (%) of accumulated hG-CSF found in the form of inclusion,bodies for the native and optimized gene are described in Table 1. TABLE1 Comparison of the accumulation levels of hG-CSF for the native and theoptimized gene (Fopt5) hG-CSF content (%) in total proteins cultivationand induction conditions native gene coding for hG-CSF optimized geneFopt5 cultivation temperature Expression system 25° C. 42° C. 25° C. E.coli BL21 (DE3) medium traces <1% >40% pET3a LBG10/amp100 0.4 mM IPTG E.coli BL21 (DE3) medium <1% <1% >52% pET3a GYSP/amp100 0.4 mM IPTG E.coli BL21 (DE3) medium <1% <1% >52% pET3a LYSP/amp100

The indicated values for hG-CSF contents are obtained by thedensitometric analysis of SDS-PAGE gels stained with Coomassie brilliantblue in the case of Fopt5 (FIG. 3A and FIG. 4) and by using thedetection with antibodies (in the case of unoptimized gene (FIG. 3B). Inthe case of Fopt5 the relative amount of hG-CSF for the estimation ofexpression level was determined with the profile analysis (programMolecular analyst; BioRad) of the gels by using the apparatus Imagingdensitometer Model GS670 (BioRad).

The results show a drastically improved expression level when theoptimized synthetic gene Fopt5 was used.

Example 3 Expression of the Synthetic Gene Coding for hG-CSF in E. coli(Kanamycin Resistance)

The optimized gene Fopt5 was excised from the plasmid pET3a/P-Fopt5bearing the ampicilline resistance with the restriction enzymes NdeI andBamHI and the gene was then subcloned in the final expression plasmidpET9a bearing the kanamycin resistance (Novagen, Madison USA) which wasthen transformed in the production strain E. coli BL21 (DE3).

The cultures were prepared on a shaker at 160 rpm for 24-30 h at 25° C.

-   -   in GYSP/kan30 medium (20 g/l phytone, 5 g/l yeast extract, 10        g/l NaCl, 10 g/l glucose, metals in traces, 30 mg/l kanamycin).        The induction was performed with the addition of IPTG into the        medium to the final concentration of 0.4 mM.    -   in GYSP/kan15 medium (20 g/l phytone, 5 g/l yeast extract, 10        g/l NaCl, 10 g/l glucose, metals in traces, 15 mg/l kanamycin).        The induction was performed with the addition of IPTG into the        medium to the final concentration of 0.4 mM.    -   in GYSP medium without the addition of an antibiotic (20 g/l        phytone, 5 g/l yeast extract, 10 g/l NaCl, 10 g/l glucose,        metals in traces). The induction was performed with the addition        of IPTG into the medium to the final concentration of 0.4 mM.

The inoculum was prepared in LBPG/kan30 medium (10 g/l phytone, 5 g/lyeast extract, 10 g/l NaCl, 2.5 g/l glucose) and 30 mg/l kanamycin at25° C., at 160 rpm overnight.

For SDS-PAGE analysis (the estimation of the content of hG-CSF;expression level) 8 ml of the culture was centrifuged at 5000 rpm. Thepellets were then resuspended in 10 mM Tris HCl/pH=8.0 in a proportionof 0.66 ml buffer added to calculated 1 unit OD_(600nm).

The samples were mixed in the proportion of 3:1 with 4×SDS—sample bufferwith DTT (pH=8.7) and heated 10 minutes at 95° C., centrifuged and theclear supernatant was loaded on the gel. The content (%) of theaccumulated hG-CSF, found in the form of inclusion bodies for theoptimized gene are described in Table 2. TABLE 2 Accumulation level ofhG-CSF for the optimized gene (Fopt5) in pET9a vector bearing thekanamycin resistance cultivation and induction conditions hG-CSF contentcultivation (%) in total Expression system temperature 25° C. proteinsE. coli BL21 (DE3) medium GYSP/kan30 >52% pET9a-Fopt5 0.4 mM IPTG E.coli BL21 (DE3) mediumGYSP/kan15 >53% pET9a-Fopt5 0.4 mM IPTG E. coliBL21 (DE3) medium GYSP >53% pET9a-Fopt5 0.4 mM IPTG

FIG. 5 shows the SDS-PAGE (4% stacking, 15% separating; stained withCoomassie brilliant blue) of the samples of the proteins from theinduced culture of production strain E. coli BL21 (DE3) with theexpression plasmid pET9a-Fopt5 at 25° C. The cultures were cultivated attwo different kanamycin concentrations and without kanamycin,specifically in GYSP/kan30, GYSP/kan15 and GYSP medium.

Legend:

Lane 1: LMW (BioRad)

Lane 2: BL21(DE3) pET9a-Fopt5 in GYSP/kan30 medium induced with IPTG at25° C. (5 μl) (above 52% hG-CSF)

Lane 3: LMW (BioRad)

Lane 4: BL21(DE3) pET9a-Fopt5 in GYSP/kan15 medium induced with IPTG at25° C. (5 μl) (above 54% hG-CSF)

Lane 5: BL21(DE3) pET9a-Fopt5 in GYSP medium induced with IPTG at 25° C.(5 μl) (above 53% hG-CSF)

Lane 6: hG-CSF standard

Lane 7: LMW (BioRad)

The above cited amounts of the hG-CSF content are obtained with thedensitometric analysis of the SDS-PAGE gels stained with Coomassiebrilliant blue. The relative amount of hG-CSF for the estimation ofexpression level was determined with the profile analysis (programMolecular analyst; BioRad) of the gels by using the apparatus Imagingdensitometer Model GS670 (BioRad).

The results show that the accumulation of hG-CSF is of the same order(more than 53%) also in the culture without kanamycin, i.e. without theselection pressure. This indicates that the strain is particularlysuitable for use on the industrial scale.

1. A DNA sequence coding for hG-CSF characterized in that the sequencecomprises the nucleotide sequence of SEQ ID:
 1. 2. A DNA sequencecharacterized in that the sequence comprises a nucleotide sequenceselected from the group consisting of a combination of the followingmodifications with respect to the native hG-CSF sequence: in a “segmentI” (located at the 5′ terminal end between the nucleotide positions 3and 194): a plurality of replacements which include replacements of E.coli rare codons by E. coli preference codons and replacements of GCrich regions by AT rich regions, in a “segment II” (located between thenucleotide positions 194 and 309): a plurality of replacements of E.coli rare codons by E. coli preference codons, in a “segment II”(located between the nucleotide positions 309 and 467): no change oressentially no change, in a “segment IV” (located at the 3′ terminal endbetween the nucleotide positions 467 and 536): a plurality ofreplacements of E. coli rare codons by E. coli preference codons.
 3. TheDNA sequence according to claim 2, which encodes for a biologicallyactive G-CSF.
 4. The DNA sequence according to claim 3, wherein thenucleotide sequence is capable of providing an expression level ofG-CSF, to the total proteins after expression, of at least 50% in anexpression system.
 5. The DNA sequence according to claim 2, furthercomprising the 5′-untranslated region of the hG-CSF gene which are notchanged relative to the native hG-CSF gene.
 6. An expression plasmid,wherein the plasmid comprises the DNA sequence according to claim 1 anda plasmid vector.
 7. An expression plasmid, wherein the plasmidcomprises a DNA sequence according to claim 2 and a plasmid vector. 8.An expression plasmid according to claim 6, wherein the plasmid vectorcomprises a T7 promoter sequence.
 9. An expression plasmid according toclaim 6, wherein the plasmid vector is selected from the group of pETvectors.
 10. An expression plasmid according to claim 6, characterizedin that the plasmid vector comprises a resistance gene selected from thegroup consisting of ampicilline and a kanamycine.
 11. An expressionsystem for the expression of DNA sequence coding for hG-CSFcharacterized in that the sequence comprises the nucleotide sequence ofSEQ ID: 1, wherein the system comprises the expression plasmid accordingto claim 6 and a production strain E. coli.
 12. (canceled)
 13. Theexpression system according to claim 11, characterized in that theproduction strain is E. coli BL21 (DE3).
 14. The expression systemaccording to claim 13, wherein it is used without an antibiotic.
 15. Aprocess for construction of DNA sequence according to claim 1, whereinthe process comprises (i) applying methods in order to provide a DNAsequence which is changed relative to the native sequence coding forhG-CSF by: replacement of some E. coli rare codons with E. colipreference codons, and/or replacement of some GC rich regions with ATrich regions; and (ii) maintaining a completely unchanged part in asubstantial portion of the native sequence coding for hG-CSF.
 16. Aprocess for construction of DNA sequence according to claim 15, whereinthe DNA sequence further comprises 5′-untranslated region of the hG-CSFgene, wherein the process does not involve changes in the5′-untranslated region in one or more of the following partial regions:translation initiation region, ribosome binding site and the regionbetween the start codon and the ribosome binding site.
 17. The processfor construction of DNA sequence according to claim 15, wherein acompletely unchanged sequence according to (ii) is maintained in segmentIII in a sequence of at least 99 nucleotides in length.
 18. The processfor construction of DNA sequence according to claim 15, furthercomprising inserting said constructed DNA sequence into a plasmid vectorwhich comprises a T7 promoter sequence.
 19. The process for constructionof DNA sequence according to claim 15, which constructed DNA sequence iscapable of providing an expression level, to the total proteins afterexpression, of at least 50% in a suitable expression system.
 20. Aprocess for the expression of hG-CSF, comprising expressing the DNAsequence according to the expression plasmid according to claim 6 in E.coli.
 21. The process for the expression of hG-CSF according to claim20, wherein IPTG is used for induction at a concentration in the rangeof at least 0.1 mM to less than 1 mM.
 22. The process according to claim20, which comprises a fermentation step that is performed at atemperature of about 20° C. to 30° C.
 23. (canceled)
 24. A process forthe manufacture of a pharmaceutical composition comprising hG-CSF orbiologically active G-CSF, wherein said process comprises: (a) carryingout a process according to claim 20, (b) isolating and/or purifying thehG-CSF or biologically active G-CSF obtained by step (a), and (c) mixingthe isolated and/or purified hG-CSF or biologically active G-CSF with apharmaceutically acceptable carrier or auxiliary substance.